Analytic combinatorics for bioinformatics I: seeding methods
نویسنده
چکیده
Seeding heuristics are the most widely used strategies to speed up sequence alignment in bioinformatics. Such strategies are most successful if they are calibrated, so that the speed-versus-accuracy trade-off can be properly tuned. In the widely used case of read mapping, it has been so far impossible to predict the success rate of competing seeding strategies for lack of a theoretical framework. Here I present an approach to estimate such quantities based on the theory of analytic combinatorics. In a nutshell, the strategy is to specify a combinatorial construction of reads where the seeding heuristic fails, translate this specification into a generating function using formal rules, and finally extract the probabilities of interest from the singularities of the generating function. I use this approach to construct simple estimators of the success rate of the seeding heuristic under different types of sequencing errors. I also show how the analytic combinatorics strategy can be used to compute the associated type I and type II error rates (mapping the read to the wrong location, or being unable to map the read). Finally, I show how analytic combinatorics can be used to estimate average quantities such as the expected number of errors in reads where the seeding heuristic fails. Overall, this work introduces a theoretical and practical framework to find the success rate of seeding heuristics and related problems in bioinformatics.
منابع مشابه
The Order Steps of an Analytic Combinatorics
Analytic combinatorics aims to enable precise quantitative predictions of the properties of large combinatorial structures. This theory has emerged over recent decades as essential both for the analysis of algorithms and for the study of scientific models in many disciplines, including probability theory, statistical physics, computational biology and information theory. With a caref...
متن کاملAn Invitation to Analytic Combinatorics
ANALYTIC COMBINATORICS is primarily a book about combinatorics, that is, the study of finite structures built according to a finite set of rules. Analytic in the title means that we concern ourselves with methods from mathematical analysis, in particular complex and asymptotic analysis. The two fields, combinatorial enumeration and complex analysis, are organized into a coherent set of methods ...
متن کاملAnalytic combinatorics for a certain well-ordered class of iterated exponential terms
The aim of this paper is threefold: firstly, to explain a certain segment of ordinals in terms which are familiar to the analytic combinatorics community, secondly to state a great many of associated problems on resulting count functions and thirdly, to provide some weak asymptotic for the resulting count functions. We employ for simplicity Tauberian methods. The analytic combinatorics communit...
متن کاملA Tutorial of Recent Developments in the Seeding of Local Alignment
We review recent results on local alignment. We begin with a review of classical methods and early heuristic methods, and then focus on more recent work on the seeding of local alignment. We show that these techniques give a vast improvement in both sensitivity and specificity over previous methods, and can achieve sensitivity at the level of classical algorithms while requiring orders of magni...
متن کاملSeeder: discriminative seeding DNA motif discovery
MOTIVATION The computational identification of transcription factor binding sites is a major challenge in bioinformatics and an important complement to experimental approaches. RESULTS We describe a novel, exact discriminative seeding DNA motif discovery algorithm designed for fast and reliable prediction of cis-regulatory elements in eukaryotic promoters. The algorithm is tested on biologica...
متن کامل